Skip to content

Look up MSA files relative to input fasta/yaml#500

Open
Emrys-Merlin wants to merge 1 commit intojwohlwend:mainfrom
Emrys-Merlin:fix/relative_msas
Open

Look up MSA files relative to input fasta/yaml#500
Emrys-Merlin wants to merge 1 commit intojwohlwend:mainfrom
Emrys-Merlin:fix/relative_msas

Conversation

@Emrys-Merlin
Copy link

@Emrys-Merlin Emrys-Merlin commented Jul 22, 2025

At the moment, relative MSA lookups seem to work relative to the current working directory. I would like to propose a change in behavior where the MSA file is looked up relative to the directory of the input file. In this way, I do not need to update the input fasta/yaml if I run the model from a different directory. For me, this would make data exchange much easier, as I only need to preserve my directory layout for the exchange.

I understand that this is a change in the user interface and that there might be good reasons for the current behavior. As the code changes were rather minimal, I decided to go ahead an just open a PR. At the same time, I understand if further discussions are necessary. I am also completely open to the conclusion that this change proposal should be disregarded.

Example

Directory structure

data
├── fastas
│   └── relative.fasta
├── msas
│   └── 1e73__1__2.A__2.I.a3m
└── yamls
    └── relative.yaml

fasta input

>A|protein|../msas/1e73__1__2.A__2.I.a3m
DEEITCQENLPFTCGNTDALNSSSFSSDFIFGVASSAYQIEGTIGRGLNIWDGFTHRYPNKSGPDHGNGDTTCDSFSYWQKDIDVLDELNATGYRFSIAWSRIIPRGKRSRGVNEKGIDYYHGLISGLIKKGITPFVTLFHWDLPQTLQDEYEGFLDPQIIDDFKDYADLCFEEFGDSVKYWLTINQLYSVPTRGYGSALDAPGRCSPTVDPSCYAGNSSTEPYIVAHHQLLAHAKVVDLYRKNYTHQGGKIGPTMITRWFLPYNDTDRHSIAATERMKEFFLGWFMGPLTNGTYPQIMIDTVGERLPSFSPEESNLVKGSYDFLGLNYYFTQYAQPSPNPVNSTNHTAMMDAGAKLTYINASGHYIGPLFEKDKADSTDNIYYYPKGIYSVMDYFKNKYYNPLIYVTENGISTPGDENRNQSMLDYTRIDYLCSHLCFLNKVIKEKDVNVKGYLAWALGDNYEFNKGFTVRFGLSYIDWNNVTDRDLKKSGQWYQSFISP
>B|smiles
CC(=O)N[C@H]1CO[C@H](CO)[C@@H](O)[C@@H]1O

yaml input

version: 1
sequences:
  - protein:
      id: A
      sequence: DEEITCQENLPFTCGNTDALNSSSFSSDFIFGVASSAYQIEGTIGRGLNIWDGFTHRYPNKSGPDHGNGDTTCDSFSYWQKDIDVLDELNATGYRFSIAWSRIIPRGKRSRGVNEKGIDYYHGLISGLIKKGITPFVTLFHWDLPQTLQDEYEGFLDPQIIDDFKDYADLCFEEFGDSVKYWLTINQLYSVPTRGYGSALDAPGRCSPTVDPSCYAGNSSTEPYIVAHHQLLAHAKVVDLYRKNYTHQGGKIGPTMITRWFLPYNDTDRHSIAATERMKEFFLGWFMGPLTNGTYPQIMIDTVGERLPSFSPEESNLVKGSYDFLGLNYYFTQYAQPSPNPVNSTNHTAMMDAGAKLTYINASGHYIGPLFEKDKADSTDNIYYYPKGIYSVMDYFKNKYYNPLIYVTENGISTPGDENRNQSMLDYTRIDYLCSHLCFLNKVIKEKDVNVKGYLAWALGDNYEFNKGFTVRFGLSYIDWNNVTDRDLKKSGQWYQSFISP
      msa: ../msas/1e73__1__2.A__2.I.a3m
  - ligand:
      id: B
      smiles: CC(=O)N[C@H]1CO[C@H](CO)[C@@H](O)[C@@H]1O

Run command

boltz predict /data/fastas/relative.fasta

or

boltz predict /data/yamls/relative.yaml

I am happy to provide the MSA file. It seems to be too large to attach to the post.

Expected behavior

The prediction should work.

Observed behavior

Traceback (most recent call last):
  File "/xxx/xxx/boltz/src/boltz/main.py", line 608, in process_input
    raise FileNotFoundError(msg)  # noqa: TRY301
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: MSA file ../msas/1e73__1__2.A__2.I.a3m not found.
Failed to process /data/boltz_msa_example/yamls/relative.yaml. Skipping. Error: MSA file ../msas/1e73__1__2.A__2.I.a3m not found..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant